What the F-measure doesn't measure: Features, Flaws, Fallacies and Fixes

نویسنده

  • David M. W. Powers
چکیده

The F-measure or F-score is one of the most commonly used " single number " measures in Information Retrieval, Natural Language Processing and Machine Learning, but it is based on a mistake, and the flawed assumptions render it unsuitable for use in most contexts! Fortunately, there are better alternatives… What the F-­‐measure is! F-measure, sometimes known as F-score or (incorrectly) the F 1 metric (the β=1 case of the more general measure), is a weighted harmonic mean of Recall & Precision (R & P). There are several motivations for this choice of mean. In particular, the harmonic mean is commonly appropriate when averaging rates or frequencies, but there is also a set-theoretic reason we will discuss later. The most general form, F, allows differential weighting of Recall and Precision but commonly they are given equal weight, giving rise to F 1 but as it is so ubiquitous this is often understood when referring to F-measure.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

تخمین اطمینان خروجی ترجمه ماشینی با استفاده از ویژگی های جدید ساختاری و محتوایی

Despite machine translation (MT) wide suc-cess over last years, this technology is still not able to exactly translate text so that except for some language pairs in certain domains, post editing its output may take longer time than human translation. Nevertheless by having an estimation of the output quality, users can manage imperfection of this tech-nology. It means we need to estimate the c...

متن کامل

An Effective Model for SMS Spam Detection Using Content-based Features and Averaged Neural Network

In recent years, there has been considerable interest among people to use short message service (SMS) as one of the essential and straightforward communications services on mobile devices. The increased popularity of this service also increased the number of mobile devices attacks such as SMS spam messages. SMS spam messages constitute a real problem to mobile subscribers; this worries telecomm...

متن کامل

Gambling Fallacies: What are They and How are They Best Measured?

Objective: Gambling fallacies are believed to be etiologically related to the development of problem gambling. However, this evidence is tenuous due to the lack of consensus on which things constitute gambling fallacies and the adequacy of instruments that ostensibly measure them. The purpose of this paper is to comprehensively identify the main gambling fallacies and examine the reliability an...

متن کامل

A Joint Semantic Vector Representation Model for Text Clustering and Classification

Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1503.06410  شماره 

صفحات  -

تاریخ انتشار 2015